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Avatar Kiosk 



Technical Field 

This invention relates to an apparatus and method for scanning a person in three 
dimensions and creating an Avatar of that person. 

Background Art 

Several person scanning machines have been built to scan the surface of a 
person. Some person scanning machines are limited in scope to parts of a person 
such as the head or hand. 

The amount of time to scan a person varies in different machines. Some person 
scanning machines capture all the data quickly enough to freeze any movement. 
Other machines take several seconds. 

Since parts of the human body obscure other parts of the body, it is necessary to 
either have multiple sensors or relative movement between the sensor and the 
person. The main methods of movement that have been used in person scanning 
machines are cylindrical rotation of the person or linear translation of several 
sensors. 

All machines allow the person to adopt any posture during scanning. However 
most person scanning machines have a restricted data capture volume and any 
parts of the body outside that volume are missed. 

All person scanning machines capture a representation of the 3D shape of the 
body. Some machines capture other information such as colour as well. 

Most of these person scanning machines are very large in size and often occupy a 
dedicated room. None of these machines are enclosed in a small kiosk. Passport 
photo kiosks are common in public places. They are usually enclosed, apart from 
an access place for a person. 

All of these person scanning machines have been designed for a research or 
laboratory environment. They are usually unrobust and not designed for dirty 
environments in which they can be knocked and subject to high degrees of wear 
and tear. 
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Since mosi person sconrunc mochinei use sensoii thai ore c long distance owoy 
from the subject and allow the subject to adopt any posture, it is usual for data in ^ 
many areas to be obscured. 

All existing person scanning machines require highly trained personnel to operate 
them. None are intended for use by a passer-by. 

None of the systems include electronic weighing means. Weight cannot be 
accurately estimated from the known 3D shape of a person, because different 
people have different amount of fat, bone and tissue, each of which have different 
densities. 

The raw output of all these person scanning machines is a large number of 3D 
points and texture information in different forms. Special software is then used to 
convert the raw data into the required forms. Such special software needs a 
skilled operator to use it. 

3D Scanners Ltd in the UK developed a whole body scanner called PERSONA in 
1993. The person being scanned stood on a rotating table. Two sensors 
projected laser stripes through the axis of rotation of the table. As the person was 
turned, the shape of the body was captured using the principal of laser stripe 
triangulation. Considerable amounts of surface are obscured using this design. 

Cyberware Labs Inc in the USA developed a whole body scanner in 1994. The 
person remained stationary whilst four sensors which projected horizontal laser 
stripes descended down gantries. The colour of the person was also recorded. 
The process takes around 17 seconds. This is a long time for a person to 
maintain the same position. Also, data is not captured for instance the top of the 
head and the tops of the shoulders are missed. 

Several companies have developed 3D cameras which take a large number of 
measurements in a short period of time and have used several 3D cameras at the 
same time to scan a person. 3D cameras have never found commercial success, 
mainly due to technical limitations. 

There are several systems for capturing facial expressions. They mostly use 
ordinary cameras and landmarks stuck on key points on a person's face for use in 
real-time in which a virtual actor driven by a powerful computer mimics the 
person whose facial expressions are being recorded. 

Avatars, also known as virtual humans, are used to represent a person in a virtual 
environment. Many different 'anthropomorphic' software models of Avatars have 
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been developed, 
has been defined. 



No international standard or format ior such an Avatar model 



There are three main types of Avatar representations: sprites, polygons and high- 
level surfaces. Sprites are a pseudo-3D representation in which an Avatar consists 
of several 2D images taken from different angles. The nearest image to the 
direction of observation is used to represent an Avatar in a virtual environment. A 
human sprite Avatar can be created from a number of photos of that person. 
Sprites are mainly used in games where there is no close-up of the cahracter 
being represented. Most virtual environments are 3D and have close-up 
navigation. The sprite representation in a virtual environment is very poor when 
the viewpoint into the environment is close to the position of the sprite image. 
Polygons are commonly used and can include texture maps. High-level surface 
representations are not used very often. 

It is very difficult to manually create polygon and high-level surface Avatars similar 
to real people. Using photos and a mouse, it can take many days of skilled work 
to create a poor representation of that person. 

Most Avatars in use in virtual worlds today bear no resemblance to the person 
they represent. The Avatars in use that look closest to the people they represent 
are sprites made from photos and standard polygonal models on which a bit- 
mapped image of the person's face is wrapped. 

In the real world, a person's identity is guaranteed by a card or passport which 
usually bears a photo of the person. Photos have been mandatory in passports in 
some European countries for over 80 years. Financial guarantees are often given 
in the form of credit cards with encoded information on them which are checked 
for validity against central records. On the Internet, a high-security encryption 
system is offered by many software vendors to secure information transmissions. 

Disclosure of Invention 

It is an object of this invention to provide a novel kiosk in which the Avatar of a 
person is automatically generated. 

The kiosk [1] is free-standing and has a central area [2] for a person [3] to stand 
in. Foot positioning guidance means [4] are provided on the floor [5] of the 
central area. Sensing equipment [6] is situated in front of and behind the person. 
The sensing equipment is connected to computing means [7]. A display [8] is 
connected to the computing means [9]. The computing means [9] is connected to 
an operator input means [11]. The computing means [9] is connected to a 



3 



ielecommunicoiiont meant P^J. ^ e kiosk [Ij it connecied to o source oi powei 
[13]. A means for reading encoded information [14] is connected to the 
computing means [9]. A printer [15] may be connected to the computing means 
[9] which outputs a printed document [16] into a tray [17]. Electronic weight 
measurement means [18] may be positioned under the foot guidance means [4] 
and be connected to the computing means [9]. A sound recording and digitising 
means [19] may be connected to the computing means [9]. A rolling walkway 
[20] could be installed in the floor [5] of the central area. 

The person stands in a specified posture [21] with his feet [22] positioned using 
the foot positioning guidance means [4] so that the person faces in the specified 
direction with his feet in the specified places. The specified posture may require 
that the arms [23] are held away from the body to either side such that the arms 
and the main body [24] are roughly in the same, approximately vertical plane P. 
The hands [25] may be specified to have the fingers [26] spread and the palms 
facing backwards (thumbs [27] pointing down) such that the hands and fingers 
are roughly in the same plane P. The legs [33] and the back [34] may be specified 
as being straight. A head positioning guidance means [28] may be used to locate 
the head [29] such that it is roughly vertical and not too far forward or back. 

An example of the foot positioning guidance means [4] is two footmark graphics 
on the floor [5] of the kiosk [1]. Such graphics give both position and direction in 
a clearly recognisable form. A second example is two, slightly raised foot-shaped 
areas. Such areas raise the feet slightly and enable them to be better scanned. 

An example of such head positioning guidance means is a beam of light [30] 
projected vertically downwards such that in the optimum posture, the person will 
see the incidence of the beam of light [31] with the tip of his nose [32] in a mirror 
[37]. 

An example illustration of the specified posture [21] which may comprise written 
words or graphics or both may be provided as an aid to the person to help him 
attain the specified posture. Alternatively or additionally, a sound projection 
means [36] may be connected to the computing means [9] to issue posture 
instructions which may either be pre-recorded or issued in a computer-generated 
voice. 

The user input means [11] may for example be a touch screen surface on the 
display [8] or a keyboard or a panel with a number of buttons or any combination 
of these or other user interface means. It is used to select any options the kiosk 
offers. The user may enter the electronic mail address to which the Avatar dataset 
should be sent or a password for identifying a retrieval from a central server. 



More extensive identity information such as personal and financial information 
may be entered. 



The telecommunications means [12] is used to deliver the Avatar. It is also used 
to carry out the credit card transaction. It can also automatically notify a central 
service computer of any malfunction; similarly any lack of response to an 'are you 
there kiosk' enquiry from the central service computer can be checked by means 
of a visit. 

The means for reading encoded information [14] may be a credit card 'swipe' 
reader. The use of credit cards has the advantage of enabling high amounts to 
be charged for Avatar creation without making the kiosk [1] an attractive target for 
thieves which would be the case if coins or banknotes were inserted. A second 
advantage of credit cards is that the kiosk would not require regular visits to empty 
it of money. If electronic money becomes generally accepted then the kiosk would 
supprot that and other forms of credit card or cash would not be used. 

The display [8] may be a computer monitor. It may be a large LCD flatscreen or 
any other form of pixel display technology. It is used for a variety of functions 
including: 

• offering options for the user to select 

• acknowledging selections / payments 

• echoing user input 

• confirming user electronic mail address availability 

• issuing instructions 

• displaying the resulting Avatar 

• displaying an address/password where the Avatar dataset may be downloaded 
from 

• advertising what the kiosk does pre-sale 

• advertising other products post-sale 

A printer [15] may provide the user with a take-away image of his Avatar dataset 
or a password and address for downloading his Avatar from a central server. 

The electronic weight measurement means [18] can measure the weight of the 
person and his clothes. Weight measurement would be more useful where 
minimal cothing is worn. 

Sound recording and digitising means [19] may be used to capture the voice of 
the person as he makes sounds. A combination of speaking specified words and 
making known sounds such as laughing might enable the main characteristics of 
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c person'^ voict- ic be- deierminec t-uth ihci he con be mote occuroiely 
represented to others in virtual environments ie his Avatar sounds like him. 



The rolling walkway [20] could be used to capture motion such as walking. The 
walkway could be motor driven or inactive. It could be uni-directional or bi- 
directional. A bi-directional walkway would enalbe more complex motions such 
as turning to be captured. The sensing equipment [6] would need to be able to 
identify and follow the motion of the person. Kiosks with motion capture 
capabilities would be larger than those without. If the kiosk cannot capture 
motion, then using the user input means [11], the person can specify the 
characteristic way in which he moves by answering questions and selecting a walk 
similar to his from a small library of electronic video displayed on the display [8]. 
It is difficult to know your own walk, so the presence of a friend may help. The 
question-answering and video selection process may be interactive with dynamic 
animations of the person's Avatar being shown after each question is answered. 

The sensing equipment [6] can capture the person's shape and colour over a 
period of time. It can also capture the person's face with several different 
expressions such as happy, sad, angry and laughing in several sequential 
sessions. Capturing expressions provides more information such that the reality of 
the person's Avatar may be enhanced. Application software can use the Avatar 
datasets including the person's expressions to visually recreate moods. 

The sensing equipment [6] can be of many different types. It can consist of a 
number of static 3D cameras. The 3D cameras could use fringe or other types of 
structured light. It could consist of a number of laser stripe sensors translated on 
two linear axes. Colour cameras and flashlights could be used to capture the 
colours of the person. 

The sensing equipment [6] needs to be registered so that the data from all the 
sensors can automatically be registered in one coordinate system. This would be 
done by means of a simple alignment procedure after installation. The best 
method is probably the capturing of a known artifact. It assumes that all the 
sensors are calibrated and in two known frames. One such artifact [40] consists 
of 8 spheres [41] in a known relation to each other. The 8 spheres are connected 
by four horizontal tubes [42] and a series of vertical wires [43]. A weight [44] on 
rubber string [45] rests on the floor to stop the artifact swinging. This artifact [40] 
can be hung from the top of the kiosk [46] and also has the capability of being 
stored in a relatively small travelling case. 

One embodiment of the sensing equipment [6] is two linear axes [50a, 50b] on 
either side of the person. The main data capture method is laser stripe. Laser 
stripe is low-cost, relatively fast and does not give ambiguous results. The sensing 
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equipment mounted on a linear axis travels |ust over a quarter o\ the person's 
height. There are eight sensors [57] on each axis. On each axis, sensors may be 
mounted in pairs to give better all round coverage, reduce stripe length and 
reduce occlusion. In this way the capture time is quartered, the time between 
scans for the sensing equipment to retract is quartered and a capture time of less 
than 5 seconds is achievable. This leads to a total of 16 sensors. Each sensor 
has one stripe [51] pointing at a small angle (5 to 10 degrees) upwards and a 
second stripe [52] pointing at a small angle (5 to 10 degrees) downwards. The 
stripes are switched so only one stripe is illuminated at a time thus removing the 
chance of ambiguous data. Some sensors have two cameras [53, 54], other 
sensors have one camera [either 53 or 54], Each stripe may be generated by a 
laser diode collimator [55] with a visible red 670 nm laser diode and a fixed 
cylindrical optic [58]. Each laser diode may be Class II which is eye-safe. The 
specification of a known, compact posture enables the sensing equipment to be 
made much smaller. 

In this embodiment, 16 colour cameras are used with 16 flash lights to record the 
colour of the person. The colour images are taken at three points: at the top of 
the travel, in the middle and at the bottom. At each point, the eight colour 
cameras and flashes on one side operate synchronously and shortly afterwards 
the eight colour cameras and flashes on the other side operate synchronously. 
This avoids cameras on one side being blinded by the flashes on the other side. 
The 16 colour cameras [58] and flashes [59] may be mounted on the 16 sensors 
[57]. 

In this embodiment, the two axes are motor driven by either servo or stepper 
motors. The ballscrew or other drive mechanism need only be just over a quarter 
of a person's height. A single linear bearing is used on each axis to maintain 
alignment, straightness and reduce cost. 

The large quantity of 3D points resulting from the scan and the colour data must 
be automatically converted into an Avatar. Most sensors also provide some 
estimate of surface normal for most 3D data points. 

A closed, texture mapped, 3D polygonal mesh Avatar can be automatically 
constructed from the data. One automatic algorithm pipeline that can be used is 
this: 

• for each sensor create a 2.5D triangular sub-mesh from the data for that 
sensor (these sub-meshes will often have holes in them) 

• populate a multi-resolution, volumetric data structure with the 2.5D triangular 
sub-meshes to form an implicit surface 

• use the implicit surface to generate a single polygonal mesh 



• lill any holeb in ihe mesh 

• decimate (poygon reduce) the polygonal mesh to reduce the quantity of data 
to the desired amount 

• smooth the mesh using a median smoothing algorithm to maintain features 

• reverse render the colour images onto the mesh to produce texture maps 

This polygon Avatar is automatically generated and may be used at different levels 
of detail in a variety of virtual environments. The random nature of the polygons 
make it difficult to use the Avatar for animation of body and face. 

To animate an Avatar's body, the identification of joints and the construction of a 
jointed model is required. One automatic algorithm pipeline that can be used to 
achieve an animatable Avatar is this: 

• find landmarks from 3D feature detection using heuristic rules based on the 
known posture. Landmarks might include head, nose, ears, feet, hands, 
fingers 

• fit a 'standard' human jointed model between the landmarks 

• optimise the jointed model to the polygon model using a least energy 
approach 

An Avatar's face can be animated based on a standard model or if several 
expressions of the person have been captured a special model can be generated. 
The identification of facial features is required. One automatic algorithm pipeline 
that can be used to achieve facial animation is this: 

• find landmarks on face using colour textures eg eyebrow ends, mouth corners, 
nose, eyes, chin 

• work out their position relative to each other in the different expressions 

• normalise the different expressions in space ie register them together 

At present there is no standard Avatar format or model. Each software 
application that uses avatars has its own format. Some formats are static, some 
animatable, some include expressions, some personal and financial information. 

An Avatar Kiosk can output all its Avatars in a standard format of its own 
specification: 'AVSTAN'. The AVSTAN format may use any or all of the following 
defined data structures and any number of data instances for the structures: 

• static shape topology 

• texture maps registered to the shape topology 

• landmarks indexed to the shape topology 

• standard joint topology 
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• positions oi joints relative to the shape topology 

• expression shape deformations of static shape topology eg angry, happy 

• characteristic motions eg walking 

• contact information eg e-mail address, geographical locations, telecomms 
numbers 

• financial information, spending habits 

• personal information such as weight, age, likes, habits, family 

• medical, health information 

• curriculum vitae (resume) 

• community record eg fines, imprisonments 

• relationship information eg psychological profile, attraction profile 

By fully defining the AVSTAN format and having it include a wide range of data, 
different software packages can then quickly generate a representation of a 
person entering their virtual environment in that software's format from the 
person's AVSTAN dataset and knowledge of the standard AVSTAN format. The 
advantage of a standard is that it is recognised in all environments worldwide. 

What a person wears is relevant. Bulky clothes obscure a person's true shape. If 
an Avatar is generated of someone wearing bulky clothes, then it will not be 
possible to use the Avatar data for sizing in teleshopping or for many health and 
beauty applications. It will also be technically difficult for the Avatar to change 
into other virtual clothes. One solution is for a person to have a different Avatar 
made with each new clothes outfit and hairstyle. The alternative of few clothes 
requires the removal of most clothing. It is likely that an Avatar Kiosk would need 
to be situated where people are happy to remove their clothes such as in the 
changing area of a clothes shop or a sports centre. 

The user can specify whether and how much of his Avatar identity be published in 
a directory similar to a telephone directory. The user can also specify whether his 
Avatar identity be made available for direct marketing purposes. If the value of 
the data is high to direct marketing companies then the cost for using the Avatar 
Kiosk may be reduced. As the entrypoint to Cyberspace, the Avatar Kiosk is likely 
to be of high strategic value to many tyes of company. 

The high quality of the Avatars produced in the kiosk will improve the reality of 
virtual environments and is likely to enhance their uptake by the public in different 
applications. Like passport photo kiosks the cost is low. Unlike passport photo 
kiosks a polaroid Avatar cannot be taken at home - the specialised equipment in 
the Avatar kiosk is not available in homes and offices. It is likely that the Avatar 
kiosk will have high margins and a high market-share of all those people 
requiring Avatars. 



Brief Description of Drawings 



A specific embodiment of the invention will now be described with reference to 
Figure 1 which is an outline of the system. 

The device [1] is self-contained with a central area [2] for a person [3] to stand in. 
Two raised footstands [4] are provided on the floor [5] of the central area. 
Sensing equipment [6] enclosed in the device [1] is situated in front of and behind 
the person. The sensing equipment [6] is connected to a computer [7]. A display 
monitor [8] is connected to the computer. The computer is connected to the 
operator panel [1 1]. The computer [7] is connected to a telephone line [12]. The 
device [1] is connected to mains electricity [13]. A credit card swipe reader [14] is 
connected to the computer. A colour printer [15] connected to the computer 
outputs a colour image of the Avatar [16] into a tray [17] where the person can 
recover it. Electronic scales [18] are situated under each footstand [4] and are 
connected to the computer. A microphone [19] is connected to the computer. 

Figure 2 is an outline of a preferred posture. The person [3] stands up on the 
footstands [4] with his back [34] straight. His feet [22] face forward and his legs 
[33] are straight. His arms [23] are stretched out to either side with his hands 
[25] with palms facing backwards, thumbs [27] pointing downwards and fingers 
separated. His head [29] is located with a beam of light [30] from a projector 
[28] makes a spot of light [31] on his nose [32]. 

Figure 3 is an illustration of the registration artifact [40]. Eight spheres [41] are 
hung in an accurately known orientation relative to each other from the top [46] 
of the central area. Four horizontal tubes [42] connect the spheres [41] in pairs. 
The tubes [42] are connected together by vertical wires [43]. A weight [44] on a 
rubber string rests on the floor [5] to help stop the artifact swinging. 

Figure 4 is an illustration of the two linear axes [50a, 50b] with 16 sensors [57] in 
four groups of four sensors. 

Figure 5 is an illustration of a sensor [57]. Each sensor [57] has one stripe [51] 
pointing at a small angle (5 to 10 degrees) upwards and a second stripe [52] 
pointing at a small angle (5 to 10 degrees) downwards. The stripes are switched 
so only one stripe is illuminated at a time thus removing the chance of ambiguous 
data. All sensors have two cameras [53, 54]. Each stripe is generated by a laser 
diode collimator [55] with a visible red 670 nm laser diode and a fixed cylindrical 
optic [58]. Each laser diode is Class II which is eye-safe. A colour camera [58] 
and a flash light [59] are mounted on the sensor [57]. 
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Figure 6 is a plan view of the device [1] with the person [3] standing in the 
recommended posture such that his body, head, arms, hands and legs are 
approximately aligned with a near-vertical plane P. The sensors [57] are rigidly 
attached to one of the two axes [50a, 50b]. The positions of the four sets of 
sensors [57] in Figure 6 can be optimised to recover the maximum amount of the 
surface area of the person. 

Modes for carrying out the invention 

A configuration for the Avatar Kiosk is shown in Figure 1 . This configuration is 
only one of many modes for carrying out the invention. 

Industrial Applicability 

The invention can be applied in the form of kiosks. Such kiosks can be placed in 
accessible positions such as airports, amusement centres, shops and health 
centres throughout the world. The kiosks will be connected to a 
telecommunications network to assure the transfer of the financial transaction and 
the Avatar dataset. People can use a credit card or electronic money to pay for 
the transaction. The whole process will take a few minutes. 

Applications of the resulting Avatar datasets include: the 'passport' or person;s 
identity in cyberspace or virtual environments, health and beauty treatment; 
teleshopping, virtual meetings, lonely hearts, games, clothing size specifications, 
medical treatments and guaranteeing financial transactions. 

It is likely that people will wish to repeat the process at different times in life or in 
wearing different clothes or with different hairstyles and makeup. 
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